
<h1>Creating a Client-Side Search Engine With Gears</h1>
<div class='byline-author'>Brad Neuberg, Gears Team, July 2008</div>

<h2>Summary</h2>
<p>
  This article describes how Gears can be used to create a client-side search engine plugged right into your web page. Learn how to add this functionality to your own web site, then dive deep to see how Gears and the Dojo toolkit were combined to create this client-side search engine. Readers should have experience with JavaScript and a basic understanding of Gears.
</p>

<h2>Introduction</h2>

<p>
  Did you know that you can use <a title="Gears" target="_blank" href="http://gears.google.com">Gears</a> to do fast, client-side searching of data, similar to a client-side search engine? Gears bundles <a title="Full-Text Search" target="_blank" href="http://code.google.com/apis/gears/api_database.html#sqlite_fts">Full-Text Search</a> (FTS) abilities right into its local, <a title="SQLite database" target="_blank" href="http://code.google.com/apis/gears/api_database.html">SQLite database</a>. <a title="MySpace" target="_blank" href="http://www.myspace.com">MySpace</a>, for example, <a title="recently used this feature" target="_blank" href="http://gearsblog.blogspot.com/2008/05/myspace-message-center-is-now-searching.html">uses this feature</a> with their MySpace Mail application, downloading all of a user's messages for fast, client-side search. Because all of the data is local, you can do nifty things like search over the data in real-time as the user types, something that is much harder if you have to query over the network to a server to do the searching.
</p>

<p>
  Would you like to add the same kind of fast, local searching to your own web page and web applications? This article introduces you to PubTools Search and the Gears features that power it, namely Full-Text Search and Workers. <a title="PubTools Search" target="_blank" href="http://code.google.com/p/gears-pubtools">PubTools Search</a> is an open source JavaScript library that drops a client-side search engine right into your page. You configure it with basic HTML plus a list of URLs to index. Once loaded, a search form that uses the local Gears full-text search abilities will appear in your page to quickly and locally search over the documents in real time as a user types into the search field.
</p>

<p>
  Please note that PubTools Search is not an official Google project or Gears API; it is a project I created on my own to teach and help developers. The Gears team does not support this project. However, <a title="email me" target="_blank" href="http://tinymailto.com/bradneuberggoogle">email me</a> if you have questions, concerns, or patches while working with the code.
</p>

<p>
  Give PubTools Search a try right now with the demo embedded into this page below! The search form below uses PubTools Search to grab and index several free, public domain books from <a target="_blank" href="http://www.gutenberg.org/wiki/Main_Page">Project Gutenberg</a>, including books from <a target="_blank" href="http://gears-pubtools.googlecode.com/svn/trunk/demos/resources/goethe.txt">Goethe</a>, <a target="_blank" href="http://code.google.com/p/gears-pubtools/source/browse/trunk/demos/resources/descartes.txt">Descarte</a>, <a target="_blank" href="http://gears-pubtools.googlecode.com/svn/trunk/demos/resources/goldman.txt">Emma Goldman</a>, and <a target="_blank" href="http://gears-pubtools.googlecode.com/svn/trunk/demos/resources/">more</a>. Type the word 'history' and notice that as you type results are returned instantaneously since everything is happening locally. If for some reason the search form does not appear below you can run it <a href="http://codinginparadise.org/projects/gears_pubtools/latest/demos/search/demo.html">here</a>.
</p>

<iframe src="http://codinginparadise.org/projects/gears_pubtools/latest/demos/search/iframe.html" style='overflow: visible; width: 60%; height: 25em; border: 0px;'></iframe>

<p>PubTools Search is also a great educational tool and source code for developers that want to know how to use Gears <a title="Workers" target="_blank" href="http://code.google.com/apis/gears/api_workerpool.html">Workers</a> and FTS together for their own web applications, as well as best practices around working with these for performance and reliability.</li>

<p>
  This article covers the following:
</p>

<ul>
  <li>An introduction to Gears' Full-Text Search and Worker features</li>
  <li>How to drop PubTools Search into your page to quickly get going</li>
  <li>Deep walkthrough and dissection of how PubTools Search works internally with source code and snippets so you can use these Gears features in your own applications</li>
  <li>An introduction to parts of the <a href="http://dojotoolkit.org">Dojo toolkit</a> used in PubTools Search, using actual source from PubTools Search including a discussion on some of the techniques used in modern JavaScript development</li>
  <li>Tips and tricks when working with Gears</li>
</ul>

<p>
  By the end of this article you should have a better grasp of Gears, PubTools Search, and Dojo, including how to use the Gears Full-Text Search and Workers in your own web applications for fast, client-side search.
</p>

<h2>Full-Text Search and Gears Workers</h2>

<p>Gears <a href="http://www.sqlite.org/">bundles a local relational database</a> that web sites can tap into and use. While a relational database can store and query the data that is present, traditionally a database can not do partial matching of documents in an efficient way. For example, I couldn't ask to return all the rows from a database that partially match the word 'orange'. This is the role of a search index, which builds up a fast way to match and find all documents that have some term. SQLite, using the <a href="http://www.sqlite.org/cvstrac/wiki?p=FtsTwo">fts2 module</a> bundled with Gears, can create <a href="http://code.google.com/apis/gears/api_database.html#sqlite_fts">special 'virtual' tables</a> that are in fact backed by seach indexes so that you can quickly find matches for search terms using special SQL. This is known as <a target="_blank" href="http://en.wikipedia.org/wiki/Full_text_search">Full-Text Search (FTS)</a>.

<p>
  Full-Text Search (FTS)</a> in Gears allows you to <a target="_blank" href="http://code.google.com/apis/gears/api_database.html#sqlite_fts">create special tables</a> in the local relational database. When you insert data into these tables, the data is indexed in such a way that searching over all the data to find full or partial matches is very fast. In essence, Gears and FTS gives you the ability to roll your own, client-side search engines that can work with very specific kinds of data, such as your corporate directory, a corpus of documents, and more.
</p>

<p>
  Writing to the database and searching over the FTS table can be quite intense, since it is hitting the hard drive at regular intervals. In traditional web applications without Gears everything JavaScript does has to run on the same thread as the web browser. If JavaScript does something intensive, the browser itself slows to a crawl and freezes. Gears <a href="http://code.google.com/apis/gears/api_workerpool.html">Workers</a> are a way to run JavaScript on threads separate from the browser's user-interface, allowing you to do considerable work while keeping the browser responsive. As you will see in this article, PubTools Search uses Workers to do all of its FTS database operations, ensuring the browser stays fast.
</p>

<h2>Using PubTools Search</h2>

<p>
  In general, Gears tends to give you lower-level primitives, such as FTS and Workers. It is the job of developers to tie these together using JavaScript to create higher-level applications that use Gears. If you're just getting started with web development and Gears this can be alot to absorb all at once. In the first part of this article I'll show you how you can use PubTools Search to get some of the great abilities of Gears into your page without having to delve into JavaScript, just by sprinkling a little bit of HTML into your page. In the second part I'll delve into the internals of PubTools Search so you can use these Gears features inside your web application itself.
</p>

<p>
  To use PubTools Search in your web page, first download the following files and put them on your web server. PubTools Search is under an Apache v2 license so you can safely use this in commercial projects. You can also <a href="http://code.google.com/p/gears-pubtools/downloads/list">download the PubTools Search ZIP file</a> and find these files in the <code>src/</code> directory.
</p>

<ul>
  <li><a href="http://gears-pubtools.googlecode.com/svn/trunk/src/pubtools-search.css">pubtools-search.css</a></li>
  <li><a href="http://gears-pubtools.googlecode.com/svn/trunk/src/pubtools-search.js">pubtools-search.js</a></li>
  <li><a href="http://gears-pubtools.googlecode.com/svn/trunk/src/pubtools-util.js">pubtools-util.js</a></li>
</ul>

<p>
  Second, add the PubTools Search CSS and JavaScript to your web page in the <code>HEAD</code> portion of your HTML document:

<pre class="prettyprint">
&lt;link rel="stylesheet" type="text/css" href="pubtools-search.css"&gt;&lt;/link&gt;

&lt;script type="text/javascript" src="pubtools-util.js"&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="pubtools-search.js"&gt;&lt;/script&gt;
</pre>

<p>
  Next, create a file named <code>search.txt</code> that is in the same directory as your HTML page. This file will contain a list of URLs that you want to index locally and search over. The URLs are relative to the HTML file. Here is the <code>search.txt</code> file used in the example PubTools Search form embedded into this article above:

<pre>
version=0.6
../resources/descartes.txt
../resources/goethe.txt
../resources/goldman.txt
../resources/machiavelli.txt
../resources/montaigne.txt
../resources/kafka.html
../resources/plato.html
</pre>

<p>
  The first line must have a version, such as <code>version=0.6</code>. When this version changes, such as if you add, modify, or remove a document, then PubTools Search will reindex the documents. Following the version string you must have a URL to index, one on each line.
</p>

<p>
  The client-side search can only index HTML, text, and XML files. A good
tip is to make sure any HTML documents you want to index have a <code>TITLE</code>
element in the <code>HEAD</code>; this will be the title of the document that is returned for search results. For XML and text files we use the URL as the title.
</p>

<p>
  Finally, add a DIV to your page with the following ID:
</p>

<pre class="prettyprint">
&lt;div id='st-widget'&gt;&lt;/div&gt;
</pre>

<p>
  When PubTools Search loads it will put the search UI box into this DIV.
</p>

<p>
  That's it, you're done! You can see sample HTML <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/demos/search/demo.html">here</a>. You can also override the filename for the list of search URLs and the ID of the search widget; see the <a href="http://gears-pubtools.googlecode.com/svn/trunk/README.txt">README file</a> for details.

<h2>Dissecting PubTools Search</h2>

<p>
  Let's jump right in and delve into the code for PubTools Search. First, all of the source is in a single file that you can refer to, <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41"><code>pubtools-search.js</code></a>. Here are the following JavaScript classes and their responsibilities in PubTools Search:
</p>

<ul>
  <li><a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#38"><code>SearchTools</code></a> - A singleton that exposes our public API and kicks things off. Grabs configuration options for PubTools Search; creates our client-side database; and acts as the central manager.</li>
  <li><a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#232"><code>UI</code></a> - Encapsulates the search user-interface, responding as the user types. Also takes a list of search results and can display them on the page.</li>
  <li><a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#452"><code>SearchManifest</code></a> - Fetches a search manifest file, such as <a href="http://gears-pubtools.googlecode.com/svn/trunk/demos/search/search.txt"><code>search.txt</code></a> so that we can get a list of URLs to work with. Also manages the version number of this manifest, persisting it into the client-side database and determining if we even need to re-index this material, since we might already have indexed it locally.</li>
  <li><a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#615"><code>Documents</code></a> - Takes an array of URLs from a manifest and fetches the document's actual values, returning textual (HTML, XML, text) documents that can be indexed.</li>
  <li><a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#743"><code>Indexer</code></a> - Indexes a set of documents into Gears' full-text search client-side database.</li>
  <li><a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#859"><code>Searcher</code></a> - Searches through Gear's client-side database for documents that match a given query, producing results and a snippet for the document.</li>
</ul>

<p>
  Here is a diagram of the classes and their methods. Private methods have an _ appended to the end. Note that some methods are left out to prevent clutter. Looking over the diagram and seeing the method names gives an overall impression of the PubTools Search system:
</p>

<p>
  <a href="ps_class_diagram.png"><center><img style="width: 675px; height: 506px;" src="ps_class_diagram.png" /></center></a>
</p>

<p>
  Let's run through the steps involved in the two most interesting phases, indexing and searching. 
</p>

<h3>Indexing</h3>

<p>
  I'll describe the indexing steps from a high-level, but if you want to see the control flow in depth you can view the diagram below. In the diagram, you will see a small 'Async!' label if the operation happens asychronously with a callback; an icon of tools if the action happens on a Gears Worker; and an icon of a disk if we are working with the local database and FTS table.
</p>

<p>
  <a href="ps_indexing_flow.png"><center><img style="width: 256px; height: 600px;" src="ps_indexing_flow.png" /></center></a>
</p>

<p>
  Indexing involves the following high level steps; as I describe the steps I've hyperlinked them right into the methods involved so you can study the source yourself and follow along. First, the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#38">SearchTool class</a> <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#169">determines our database name</a> and then <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#186">sets up the schema for our local database</a>. When the page is <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#90">finished loading</a>, we then initialize the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#232">UI class</a> and have it <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#255">embed itself</a> into the page.
</p>

<p>
  Next, we kick off downloading our search file, generally named <code>search.txt</code>, that has the URLs to index and the version of all these files by using the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#452">SearchManifest class</a>. Note that while Gears has a <a href="http://code.google.com/apis/gears/api_localserver.html#manifest_file">JSON-based manifest file</a> that is used by the Gears <a href="http://code.google.com/apis/gears/api_localserver.html">LocalServer</a>, reusing this for PubTools Search was not appropriate because a Gears LocalServer manifest can have many files that you would not want to index into a local search engine, such as JavaScript, CSS, etc. The SearchManifest class <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#469">fetches</a> the search file and then <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#507">parses</a> it into a form we can use. We then <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#547">process</a> and extract the version number from the file and <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#563">compare it</a> to what we might have stored in the local database. We don't want to re-index all of our documents every time the page loads; we only want to do so when the document list has changed. If the version hasn't changed, then we are done and don't have to index; otherwise we <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#592">store the new version</a> into our Gears Database and continue.
</p>

<p>
  Now that we have a list of URLs to download, we can instantiate the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#615">Documents class</a>. This class initially tries to determine the <a href="http://en.wikipedia.org/wiki/MIME">MIME type</a> of all the URLs we have. This is for two reasons. One, we don't want to accidentally download and try to work with a binary file, such as a Microsoft Word file. Second, we will need the MIME type later on when doing snippet and title extraction. We determine the MIME type and do <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#631">filtering</a> by issuing an HTTP <code>HEAD</code> request to the server. We filter out any URL that is not an XML, HTML, or text document. Once we've got our final list of URLs to work with we can <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#707">download</a> the documents.
</p>

<p>
  At this point we are armed with the data we need to do the real work, indexing. We instantiate the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#743">Indexer class</a> and pass it all of our document's contents, their URLs, and the file's MIME type to <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#758">index</a>. The Indexer does two big things: extract and determine a good title for each document so we can use this in return results, and save the document into our local database into a Full-Text Search table. Since <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#828">getting a title</a> and <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#808">doing the save</a> can be computationally expensive, <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#772">we do them on a Gears Worker</a> to keep the browser responsive and keep things fast. At this point we are done with indexing.
  
<h3>Searching</h3>

<p>
  Let's look at the searching side now. For low-level control details you can see the following diagram:
</p>

<p>
  <a href="ps_searching_flow.png"><center><img style="width: 559px; height: 600px;" src="ps_searching_flow.png" /></center></a>
</p>

<p>
  When a user <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#300">types into the search field</a>, we kick off a <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#55">search</a>. The hard work behind searching is handled by the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#859">Searcher class</a>. Just like indexing, <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#864">we do almost everything on a Gears Worker</a> to keep the browser responsive. First, <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#895">we search our local database's Full Text Search table</a> for results, then <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#948">generate snippets</a> based on the query string of returning the results. These results get sent to the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#232">UI class</a> and <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#358">printed to the page</a>.
</p>

<h3>Dojo &amp; Code Snippets</h3>

<p>
  Now that you've seen how PubTools Search works from a high-level, we can begin to move into viewing actual source code snippets. First, I'll need to give a quick introduction to <a href="http://dojotoolkit.org">Dojo</a>, an open source JavaScript library that PubTools Search uses as a utility library since many of the code snippets use Dojo routines.
</p>

<p>
  Dojo is a popular open source JavaScript and Ajax toolkit. It actually consists of three major pieces:
</p>

<ul>
  <li><a href="http://dojotoolkit.org/projects/core">Dojo Core</a> - A small but very feature packed library of routines, only 24K when sent from a server with <a href="http://en.wikipedia.org/wiki/Gzip#Other_uses">GZip encoding</a>. Inclues Ajax routines, events, packaging, CSS-based querying, animations, JSON, language utilities, and a bunch more.</li>
  <li><a href="http://dojotoolkit.org/projects/dijit">Dijit</a> - Skinnable, template-driven widgets with accessibility and localization built in.</li>
  <li><a href="http://dojotoolkit.org/projects/dojox">DojoX</a> - Lots of cool extensions and libraries, including <a href="http://dojotoolkit.org/offline">offline with Gears</a>, local storage, cross-browser vector drawing, and more.</li>    
</ul>

<p>
  PubTools Search uses just the Dojo Core piece. Dojo includes a special <a href="http://dojotoolkit.org/book/dojo-book-0-9/part-4-meta-dojo/package-system-and-custom-builds">build system</a> that can be used to do all sorts of nifty optimizations and things. PubTools Search bundles a small, special build of Dojo that re-namespaces the <code>dojo</code> object to be <code>pu</code>, for PubUtils. This was done so that pages that includes PubTools Search that are already using Dojo won't get code collisions. For example, instead of calling <code>dojo.hitch()</code>, the PubTools Search code calls <code>pu.hitch()</code>. The build of Dojo that PubTools Search uses is named <code>pubtools-util.js</code>.
</p>

<p>
  Let's look at some of the Dojo functions that PubTools Search uses in the context of actual code that PubTools Search uses. This will give you a chance to both get familiar with Dojo as well as learn how PubTools Search ties Gears' functionality together.
</p>

<h4><code>dojo.xhrGet</code> and <code>dojo.forEach</code></h4>
<p>
  Let's take a look at doing <a href="http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/ajax-transports">XMLHttpRequests with Dojo's <code>xhrGet</code> function</a>. We will look at the <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js?r=41#707"><code>Documents.download_</code></a> function as reference:
</p>

<pre class="prettyprint">
download_: function(downloadMe) {
  var idx = new Indexer(downloadMe.length);

  pu.forEach(downloadMe, function(entry) {
    var url = entry.url;
    var mimeType = entry.mimeType;

    ui.tickProgress();

    pu.xhrGet({
      url: url,

      load: function(data) {
        ui.tickProgress();
        idx.index(url, mimeType, data);

        return data;
      },

      error: pu.hitch(this, function(err) {
        searchTools.handleError(err);

        return err;
      })
    });
  });
}
</pre>

<p>
  First, we initialize our <code>Indexer</code> class. <code>downloadMe</code> is an array of URLs to download, filtered by MIME type to just the ones we can work with. <a href="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.forEach"><code>pu.forEach</code></a> is a useful Dojo function that takes an array of elements, loops over them, and runs the given function over and over, handing it an element to work with. <code>forEach</code> can help to make your code tighter and more readable in some cases. In the code above we use this to get each URL.
</p>

<p>
  When we get an <code>entry</code> to work with, we can call <a href="http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/ajax-transports"><code>pu.xhrGet</code></a> to fetch the given URL. Dojo's <code>xhrGet</code> function is straightforward to work with; it takes an <a href="http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Guide:Literals#Object_Literals">object literal</a> of arguments, including the <code>url</code> to load; a <code>load</code> function that is called when the values have been returned; and an <code>error</code> function that will be called if there is an error. In the <code>download_</code> function above once we get the results back we pass it to the <code>Indexer</code> to store the result for later use and indexing. We used to index each document as it came in; however, performance testing showed that it is much faster to index a large set of documents in one shot using SQLite transactions rather than individually.
</p>  

<h4><code>dojo.declare</code></h4>

<p>
  JavaScript is a <a href="http://en.wikipedia.org/wiki/Prototype-based_programming">prototype-based</a> programming language that can be used to <a href="http://www.devarticles.com/c/a/JavaScript/Object-Oriented-JavaScript-Using-the-Prototype-Property/">emulate</a> object-oriented programming. While this is fine for smaller projects, sometimes using a simpler syntax than the standard JavaScript prototype-based notation can make your code a bit more readable and maintainable. Dojo's <a href="http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/object-orientation/declaring-class"><code>declare</code></a> method makes it easy to declare a class. Here is the class definition for the <code>Searcher</code> class in PubTools Search:
</p>

<pre class="prettyprint">
  pu.declare('Searcher', null, {
      search: function(query, callback) {
      },

      escapeString_: function(str) {
      },

      getSnippet_: function(query, mimeType, content) {     
      }
  });
</pre>

<p>
  The <code>declare</code> function takes a class name, in this case <code>'Searcher'</code>; an optional super-class, <code>null</code> in the code above since we don't subclass; and finally an object literal of functions to add to this class. You can also have a special method named <code>constructor</code> that will be run when an instance of the class is created. Once you've defined your class instantiating an instance of it uses the standard JavaScript <code>new</code> keyword:
</p>

<pre class="prettyprint">
  var s = new Searcher();
</pre>

<h4><code>dojo.hitch</code></h4>

<p>
  In JavaScript programming <a href="http://blog.morrisjohns.com/javascript_closures_for_dummies.html">closures</a> are your friend. Explaining closures is beyond the scope of this article, but their use can be illustrated with the following example.
</p>

<p>
  When developers first encounter JavaScript and need to create an event listener while doing JavaScript object-oriented programming, they typically do something like the following:
</p>

<pre class="prettyprint">
  // define a class named MyClass
  function MyClass(msg) {
    this.msg = msg;
    
    // have the doSomething() method get called when a button is clicked
    var button = document.getElementById('myButton');
    button.onclick = this.doSomething;
  }
  
  MyClass.prototype.doSomething = function() {
    alert('Hello World! Our message is ' + this.msg);
  }
  
  // create an instance of MyClass
  var instance = new MyClass();
</pre>

<p>
  This won't work! In JavaScript, functions are first class citizens and aren't necessarily 'bound' to an instance; they can be passed around by themselves. When the <code>onclick</code> handler gets called after a click, <code>instance.doSomething</code> is called. Inside of <code>doSomething</code>, we try to print out the message we passed in with <code>this.msg</code>. However, <code>this</code> doesn't refer to our instance! Instead, it actually refers to the global <code>window</code> object and we see 'Hello World! Our message is undefined' printed out. More advanced JavaScript programmers then do something like this to make sure that <code>doSomething</code> is attached to our instance:
</p>

<pre class="prettyprint">
  // define a class named MyClass
  function MyClass(msg) {
    this.msg = msg;
    
    // have the doSomething() method get called when a button is clicked
    var button = document.getElementById('myButton');
    <b>  
    // make a closure to capture our instance, then call the method when
    // the button is clicked
    var self = this;
    button.onclick = function() {
      self.doSomething();
    }</b>
  }
  
  MyClass.prototype.doSomething = function() {
    alert('Hello World! Our message is ' + this.msg);
  }
  
  // create an instance of MyClass
  var instance = new MyClass();
</pre>

<p>
  We've used a closure to 'capture' the instance we are working with; then, when the button is clicked the <code>doSomething</code> method is called and the value of <code>this.msg</code> is valid and prints out correctly.
</p>

<p>
  This pattern shows up all the time in JavaScript, especially when you are working with asychronous tasks that involve callbacks such as the network and Gears Workers; if you aren't careful your code can get ugly with lots of <code>self</code> variables and tricky bugs. Dojo provides a convenient <a href="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.hitch"><code>hitch</code></a> method that makes this pattern cleaner and your code more compact. Here is a code snippet from the PubTools Search <code>Documents</code> class that uses <code>hitch</code>:
  
<pre class="prettyprint">
pu.declare('Documents', null, {
    constructor: function(urls) {
      this.filter_(urls, pu.hitch(this, this.download_));
    },

    filter_: function(urls, callback) {
      // filter the URLs here asynchronously
      // ...
      callback(filteredURLs)
    },

    download_: function(downloadMe) {
      this.doSomething_();
    }
});
</pre>

<p>
  Remember that the <code>Documents</code> class is tasked with taking a list of URLs, filtering them down and getting their MIME types, and then downloading the document's contents for indexing. Both filtering and downloading the documents are asychronous tasks involving the network, so they need callbacks. Imagine if the line in our constructor was the following:
</p>

<pre class="prettyprint">
constructor: function(urls) {
  this.filter_(urls, this.download_); // bad!
}
</pre>

<p>
  The <code>filter_</code> function takes some URLs to filter and a callback that will be called when we are done. In the above incorrect code snippet when the filtering is done the <code>download_</code> method is run. However, just like our event handling code the <code>download_</code> method will no longer run in the context of our <code>Documents</code> instance! If the <code>download_</code> tries to reference variables and functions, such as <code>this.doSomething_()</code> then things will be undefined and fail. The Dojo <code>hitch</code> method makes it easy to <em>bind</em> a given instance and function name, returning another function that can be safely used in a callback or event handler. Here is the correct code:
</p>
  
<pre class="prettyprint">
constructor: function(urls) {
  this.filter_(urls, pu.hitch(this, this.download_));
},
</pre>
  
<p>
  <code>this.download_</code> and <code>this</code>, referring to the <code>Documents</code> instance, are now hitched. When filtering is done it will correctly call the <code>download_</code> method and everything will still be running in the context of the <code>Documents</code> instance. 
</p>
  
<h4><code>dojo.byId</code> &amp; <code>dojo.query</code></h4>

<p>
  Most JavaScript toolkits provide convenience functions for quickly getting DOM elements on the page, so that you don't have to constantly litter your code with the DOM standards verbose <code>document.getElementById</code> method. Dojo provides a similar method, <code>dojo.byId</code>. Here we are getting the PubTools Search <code>DIV</code> that a developer provides in their HTML so we can fill in the UI:
</p>
  
<pre class="prettyprint">
var w = pu.byId('st-widget');
</pre>  

<p>
  Recently a new practice has developed of binding your JavaScript's behavior to the page using CSS Selectors rather than lots of DOM manipulation code, which can be very verbose. <a href="http://dojotoolkit.org/node/336">Large gains in performance</a> powering this technique by JavaScript toolkit authors has made it viable for large-scale projects, and the productivity and code maintainability it opens up are large. In <a href="http://jquery.com/">JQuery</a>, for example, this is the standard way you work with the page.
</p>

<p>
  Dojo includes similar functionality with <a href="http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/selecting-dom-nodes-dojo-query"><code>dojo.query</code></a>. Let's look at an example of how this is used in PubTools Search. In PubTools Search you can use a <code>LINK</code> tag in your HTML with a special <code>rel</code> name to over-ride the default location and name of the search manifest:
</p>

<pre class="prettyprint">
&lt;link rel="search.urls" href="/some/file/path/search_me.txt"&gt;&lt;/link&gt;
</pre>

<p>
  When we startup we want to see if this <code>LINK</code> tag is defined and get its value. Doing so using old W3C DOM manipulation would have involved getting all the <code>LINK</code> tags on the page (<code>document.getElementsByTagName('link')</code>); looping over all of them looking at the <code>rel</code> value; and then grabbing the one that might match and getting its value. This kind of older DOM code can get ugly and verbose fast. Here is how we do it using <code>dojo.query</code> in the <code>SearchTools</code> class:
</p>

<pre class="prettyprint">
getSearchManifestURL_: function() {
  var url = 'search.txt';
  var results = <b>pu.query("link[rel='search.urls']")</b>;
  if (results.length) {
   url = results[0].getAttribute('href'); 
  }
  
  return url;
}
</pre>

<p>
  Instead of lots of looping, we just use a simple CSS Selector, <code>link[rel='search.urls']</code>, which will match any <code>LINK</code> tags that have a <code>rel</code> attribute equal to <code>search.urls</code>. If it's present, we just grab its value and we are done.
</p>

<p>
  We've only scratched the surface of <code>dojo.query</code> and writing JavaScript code based on the CSS Selector idiom; <a href="http://dojotoolkit.org/book/dojo-book-0-9/part-3-programmatic-dijit-and-dojo/selecting-dom-nodes-dojo-query">see the chapter on this subject</a> in the <a href="http://dojotoolkit.org/book/dojo-book-1-0">The Book of Dojo</a> for more details.
</p>

<h2>Tips, Tricks, &amp; Best Practices</h2>

<p>
  Now that you've seen some snippets of PubTools Search in conjunction with Dojo, let's look at some general Tips, Tricks, and <a href="http://code.google.com/apis/gears/gears_faq.html#Best__PracticesTOC">Best Practices</a> when using Gears. All of these are used in PubTools Search so you can always consult the source to get more details.
</p>

<ul>
  <li>Do intensive operations on Workers! This can dramatically improve performance and keep the browser responsive. In the first version of PubTools Search I did everything (indexing, searching, etc.) on the main browser thread, and the browser locked up and took minutes to finish. I refactored everything to Workers and operations took one or two seconds to finish versus minutes, with the browser remaining functional. See <a href="http://code.google.com/apis/gears/gears_faq.html#bestPracticeDB">Best Practices for Database Performance and Reliability</a> for more details on using the database and Workers together.</li>
  
  <li>Be careful when creating your database name; its dangerous to just choose a simple string such as 'my-database' since you will run into issues. See <a href="http://code.google.com/p/gears-pubtools/source/browse/trunk/src/pubtools-search.js#192"><code>getDatabaseName_</code></a> for source to do this, as well as <a href="http://code.google.com/apis/gears/gears_faq.html#choosingNames">Best Practices for Choosing Names for Databases and LocalServers</a> for a larger discussion of the topic.</li>
  
  <li>Stringify your workers to make your code more maintainable. Here is a code snippet from PubTools Search in the <code>Indexer</code> class where we use this trick:<br />
    
    <pre class="prettyprint">
      index: function(url, mimeType, doc) { 
            workerScript =  'var getTitle = ' 
                           + String(this.getTitle_) + '; '
                           + 'google.gears.workerPool.onmessage = ' 
                           + String(this.indexWorker_);
            // ...
      },

      indexWorker_: function(a, b, message) {
      },

      getTitle: function(url, mimeType, doc) {
      }
    </pre>
    
    <p>
      See <a href="http://code.google.com/apis/gears/gears_faq.html#maintainWorkers">Best Practices for Maintainable Workers</a> for more discussion on this topic.
    </p>
  </li>
  
  <li>When passing data between Workers, you can now pass JavaScript objects and not just strings (earlier versions of Gears did not allow this but now does):<br />
    
    <pre class="prettyprint">
      index: function(url, mimeType, doc) {
        // ...

        var worker = google.gears.factory.create('beta.workerpool');
        var workerScript =  'var getTitle = ' 
                                + String(this.getTitle_) + '; '
                                + 'google.gears.workerPool.onmessage = ' 
                                + String(this.indexWorker_);
        var childWorkerId = worker.createWorker(workerScript);

        // ...

        // send the worker a message to run
        
        <b>// JavaScript object passed over to worker
        var msg = {dbName: dbName, indexMe: this.indexMe_};
        worker.sendMessage(msg, childWorkerId);</b>
      },

      indexWorker_: function(a, b, message) {
        <b>// use message.body instead of message.text to get the JavaScript
        // object
        var args = message.body;
        var indexMe = args.indexMe;</b>
        
        // ...
    </pre>
  </li>
  
  <li>Use a closure for encapsulation. In PubTools Search we have a number of private classes that are used to organize things internally, such as the <code>Indexer</code> class. However, we don't want to pollute the global JavaScript namespace with a bunch of objects. Instead, we expose one object as part of a 'public' API, named <code>searchTools</code>, that script authors can bind onto if they want to control things more such as creating a custom UI:<br />
    
    <pre class="prettyprint">
      var searchTools = function() { // top of file

        pu.declare('SearchTools', null, { /* ... */ });
        pu.declare('UI', null, { /* ... */ });
        pu.declare('SearchManifest', null, { /* ... */ });
        pu.declare('Documents', null, { /* ... */ });
        pu.declare('Indexer', null, { /* ... */ });
        pu.declare('Searcher', null, { /* ... */ });

        searchTools = new SearchTools();
        return searchTools;
      }(); // bottom of file
    </pre>
  </li>
  
  <li>When working with lots of data and the Gears database, do two things:<br />
    <ul>
      <li>Batch everything up and do it in one shot.</li>
      <li>Wrap everything with an explicit transaction; if you don't, every call to the Gears <code>db.execute</code> method will create an implicit transaction which can dramatically slow things down.</li>
    </ul>
    
    <p>
      In one version of PubTools Search we would index documents individually as they came in; the thought was that we were taking advantage of the slow network time it takes for documents to download one at a time, and that if we indexed in between downloads the total pipeline would be very fast. However, profiling showed that it is in fact much faster to simply download all the documents and do the index operation in one shot, using the two pointers above. For a deeper discussion of this topic see <a href="http://code.google.com/apis/gears/gears_faq.html#bestPracticeDB">Best Practices for Database Performance and Reliability</a>.
    </p>
  </li>
</ul>

<h2>Conclusion</h2>  

<p>
  At this point you should have greater knowledge of Gears, Full-Text Search, Workers, PubTools Search, and Dojo. I look forward to seeing how you put these pieces together in your own applications! <a title="email me" target="_blank" href="http://tinymailto.com/bradneuberggoogle">Email me</a> if you have questions or have done something nifty with the pieces provided in this article.
  
<h2>Resources</h2>

<ul>
  <li>Watch <a href="http://sites.google.com/site/io/creating-a-client-side-search-engine-with-gears">Creating a Client-Side Search Engine With Gears</a> presentation at Google I/O</li>
  <li><a href="http://code.google.com/p/gears-pubtools/downloads/list">Download PubTools Search</a></li>
  <li><a href="http://gears-pubtools.googlecode.com/svn/trunk/README.txt">PubTools README file</a></li>
  <li><a href="http://code.google.com/apis/gears">Gears Developer Community</a></li>
  <li><a href="http://code.google.com/apis/gears/design.html">Get started with Gears</a></li>
  <li><a href="http://code.google.com/apis/gears/gears_faq.html">Gears Developer FAQ</a> (A great resource for answers and code snippets!)</li>
  <li><a href="http://code.google.com/apis/gears/gears_faq.html#Best__PracticesTOC">Gears Best Practices</a></li>
  <li><a href="http://dojotoolkit.org">The Dojo Toolkit</a></li>
  <li><a href="http://dojotoolkit.org/book/dojo-book-1-0">The Book of Dojo</a></li>
  <li><a href="http://code.google.com/p/gears-pubtools/">PubTools Open Source Page</a></li>
  <li><a href="http://code.google.com/p/gears/">Gears Forums</a></li>
</ul>

<h2>Special Thanks</h2>

<p>
  Special thanks <a href="http://www.thibaudlopez.net/">Thibaud Lopez Schneider</a> for great bug-testing, optimization, and many many patches. Also thanks to Scott Hess for the awesome fts2 module that does full-text search in SQLite.
</p>
